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Scaling of folding properties of proteins is studied in a toy system - the lattice Go model with 
various two- and three- dimensional geometries of the maximally compact native states. Charac- 
teristic folding times grow as power laws with the system size. The corresponding exponents are 
not universal. Scaling of the thermodynamic stability also indicates size-related deterioration of the 
folding properties. 

PACS numbers: 71.28.-hd, 71.27.-ha 
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Recent advances in understanding of protein folding 
have been made, to a large extent, through studies of 
lattice heteropolymers with a small number of beads, N, 
In these toy models of proteins, the beads represent 
aminoacids. Lattice models allow for an exact determi- 
nation of the native state, i.e. of the ground state of the 
system, and are endowed with a simplified dynamics. An 
N of order 125 is considered to be large in such studies 
and then special sequences are considered Q. There are 
real life proteins with N as small as of order 30, but 
most of them are built of several hundreds aminoacids. 
Apparently there is no protein with N exceeding 5000 
which is orders of magnitude smaller than the number of 
base pairs in a DNA. The question we ask in this Letter 
is: how do folding properties of proteins scale with N 
and can they lead to a deterioration in stability and ki- 
netic accessibility of the native state that exceed bounds 
of functionality? 

A previous numerical analysis of the scaling has been 
done by Gutin, Abkevich, and Shakhnovich ^ who stud- 
ied three dimensional {3D) lattice sequences with N up 
to 175. For each N, they considered 5 sequences and se- 
lected one that folded the fastest under its optimal tem- 
perature Tmin- The corresponding folding time, toi: was 
the quantity that was used in studies of scaling. They 
discovered that ^oi grows as a power law with the system 
size: 
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The exponent A was found to be non-universal - it de- 
pended on the kind of distribution of the contact energies 
Bii in the Hamiltonian 



H 
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which pointed to existence of a variety of kinds of the 
energy landscapes |6). In eq.(2), Aij is either 1 or 
depending on whether the monomers i and j face each 
other, but not along the chain, or not. For random and 
designed sequences, with the Bij^s generated from the 
data base of Ref. j^, A w 6 and « 4, respectively Fi- 



nally, for the Go model Iq], in which Bi 



-1 for native 



contacts and for non-native contacts, A « 2.7. There 
were also phenomenological arguments which sug- 
gested that the folding times scale with N exponentially 
for all temperatures. Thus the nature of the scaling laws 
for the folding times remains puzzling. Perhaps more im- 
portantly, Gutin et al. |^ did not study scaling of any of 
the characteristic temperatures that are relevant for fold- 
ing nor the effects of the dimensionality were explored. 

In this Letter, we report on studies of the 2 and 3-D Go 
model, with N up to 56 and 100 respectively. In the 2D 
and A^=16 case, we consider all 37 maximally compact 
conformations (there are 69 such conformations but only 
38 of them are distinct due to the end-to-end symmetry 
of the Go model; furthermore, one conformation cannot 
be accessed kinetically). In the remaining cases, we study 
15 conformations, except for A^=80 and 100 when only 
10 and 5, respectively, are considered. Note that each of 
these structures is equally designable within the model 
because each is a nondegenerate ground state to one Go 
sequence. We demonstrate that in this case, toi is indeed 
given by eq. (1). In 2D, A is 5.9 ± 0.2 Thus the con- 
straint for the heteropolymer to lie in a plane increases 
A compared to the 3-D Go model. Our larger statistics 
also allows us to study median values, not just minimal, 
of the folding times. The median values also follow the 
power law with an effective A of 6.3 ± 0.2 and 3.1 ± 0.1 
in 2 and 3-D respectively. Actually, the effective A de- 
pends on whether the folding is studied at Tmin or at 
the folding temperature Tf. Tf is defined operationally 
as a temperature at which the equilibrium probability of 
finding the native state is ^- We find that in 2-D and at 
Tf, A — 6.6 ±0.1 (the exponent for the minimal time at 
Tf is 6.3 ± 0.3) which means that by moving away from 
conditions which are optimal for the folding kinetics one 
generates a somewhat increased exponent in the power 
law. 

Notice that good folding takes place for sequences for 
which Tf is comparable to or bigger than Tmin- Other- 
wise the folding is poor. An important novel aspect of of 
our research is that we determine the scaling properties 
of Tmin and those of the folding temperature, Tf. We 
conclude that, both in 2 and 3-D, there are indications 
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that there could be a size related limit to good foldicity. 
We find that Tmin grows linearly with N whereas Tf first 
grows like Tmin but then it falls off and possibly saturates 
asymptotically. This makes the gap between Tmm and 
Tf increase linearly with N asymptotically which would 
change the folding kinetics from excellent to bad. 

One stumbling block in studies of scaling of random 
systems is the necessity to compare quantities which are 
averaged statistically and to have some control of the sta- 
tistical ensemble used. The advantage of the Go model 
is that there is no randomness in the values of the con- 
tact energies and the ensemble is generated by the set of 
possible maximally compact conformations that can act 
as native states - i.e. the variety is only due to the ge- 
ometry of the native states. The advantage of studying 
2D models is that, for A^=16, it is feasible to determine 
the full distribution of T/, Tmin, and of the folding times 
among all of the 37 targets and then to realize that the 
median folding time probes vicinities of the peak in the 
distribution. Thus, on going to larger N and taking, as 
we usually do, 15 targets, it is reasonable to expect that 
the corresponding median time still probes the peaks of 
good foldicity. Median quantities are, in addition, more 
stable statistically, in general, whenever one deals with 
wide distributions. 
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FIG. 1. The dependence of the median folding time, tfou, 
on T for the 2D Go conformation shown in the center of 
the figure. The results are averaged over 1000 Monte Carlo 
trajectories. The inset shows the T-dependence of probability 
for the sequence to be in the native state as obtained through 
an exact evaluation of the partition function which involves 
802075 conformations. 

As to the selection of the 15 native maximally com- 
pact targets: in 2D 10 were obtained by a random con- 
struction and 5 were obtained by a multiple quenching of 



randomly shaped homopolymers until a maximally com- 
pact conformation was obtained. The homopolymers had 
identical attraction in each possible contact. In both 
methods, we generate targets to which there is a path 
of kinetic access. In 3D, all targets were obtained by the 
random construction. 

Figure 1 illustrates definitions of quantities that will be 
studied here. It shows the dependence of the median fold- 
ing time, tfoid, on temperature for one target. The target 
has N of 16 and is shown in the center of the figure. The 
optimal temperature, Tmin, is where tfoid is the shortest. 
Tmi7i signifies the onset of glassy kinetics. This quantity 
is better suited to study scaling than the glass transition 
temperature Tg |11| because the latter involves a cutoff 
time which necessarily must be N dependent, t/o/d at 
Train wiU be dcnotcd by ri. T2 is defined to be tfoid at 
Tf {Tf is larger than Tmin for the target shown in Figure 
1). In the statistical ensemble, ti is defined to be the 
median value of ri and t2 - the median value of T2 . We 
also study toi and to2 which are the minimal values of ri 
and T2 among the targets considered. 

N=16 
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FIG. 2. The distribution of folding times at Tmin (solid 
lines) and at Tf (dotted lines) for the 2D N=16 case. The 
arrows indicate their median, mean, and minimal values. The 
inset shows the distributions of Tmin and Tf. 

The folding times were obtained through a Monte 
Carlo procedure that satisfies the detailed balance condi- 
tion 1^, and was motivated by studies presented in Ref. 
|l2| . For each conformation of the polymer, one first 
determines the number of possible single and double- 
monomer (crankshaft) moves - these numbers will be 
denoted here by Ai and A2 respectively. The maximum 
value of Ai + A2, among all conformations, is equal to 
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Amax = N + 2. Probability to attempt a single monomer 
move is taken to be rAi/Amax (r=0.2). For a double 
monomer move it is {l—r)A2/Amax- The attempts are re- 
jected or accepted as in the standard Metropolis method. 
The folding time is defined as the first passage time and 
is measured by the number of Monte Carlo attempts di- 
vided by Amax- For TV > 16, it is determined based on 
50 to 200 trajectories. It should be noted that ref. ||] 
docs not specify whether the detailed balance condition 
was enforced. 
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FIG. 3. The dependence of folding times on A'^. t\ and t2 
are the median folding times at Tmin and Tf respectively, toi 
and to2 are the corresponding minimal folding times found. t\ 
is the median folding t HUG 3,t TjYiin 

as obtained by a straight- 
forward Monte Carlo procedure which does not enforce the 
detailed balance condition. The values of the effective expo- 
nent A for the 2D case are: 7.1 ± 0.1, 6.6 ± 0.1, 6.3 ± 0.2, 
6.3 ±0.3, and 5.9 ±0.2 when counting clockwise. For 3Z), the 
slopes are 3.1 ± 0.1 and 2.9 ± 0.1. 

Figure 2 shows the distribution of ti and T2 for all tar- 
gets with N—IQ. There is a substantial scatter in the 
values of so the usage of the median ti appears to be 
justified. The inset shows the corresponding distribu- 
tions of Tf and Tmin ■ Both are centered and the median 
and mean values almost coincide. Note that there is very 
little variation in Tj: all Go targets with A^=16 have 
almost identical stability properties: Tf varies between 
0.489 and 0.565. On going to larger A^'s, the distributions 
of Ti remain clustered around ti but the long time tail 
appears to extend towards longer and longer times. This 
results in an overall flattening of the distributions on the 
scale set by ti. For iV=16, the exact distribution of Ti/ti 
ends at about 8 whereas our sampling of iV=20 and 42 
yields tails in Ti/ii which are located at around 16 and 
10 respectively. Within our statistics, wc have not spot- 



ted any relatively long lasting folding processes for other 
values of N . However, their very existence for A^=20 and 
42 suggests an emergence of the tails in distributions if 
those could be sampled fully. 

Figure 3 summarizes our results on the scaling of fold- 
ing times. It demonstrates the validity of the power laws 
both for the median and for the minimal folding times. 
The effective exponents A depend on T , i.e., they depend 
on whether the kinetics was monitored at Tmin or Tf. 
This dependence is not substantial but it indicates varia- 
tions of the free energy landscape with T and underscores 
a more general lack of universality. 

The generic power laws obtained by Gutin et. al. |^ 
and by us contradict the exponential laws derived in the 
random energy model |]6|,p^. They support a generally 
accepted view that the folding process is a finite volume 
version of the first order transition . In this picture 
one may visualise the transition stage as an inhomoge- 
ncous mixture of the " new" phase in the sea of an " old" 
phase 10. The random energy model does not capture 
such inhomogeneities. 
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FIG. 4. The dependence of < Tmin > and < T/ > on A''. 
< Tmin > is fitted by a linear function: 0.44 + 0.0053Af and 
0.505 + 0.0015(7)7V for 2 and 3D respectively. The resuhs are 
averaged over the conformations that were used in the studies 
of dynamics. 

Figure 4 shows the A''-dependence of the characteristic 
temperatures. For both 2 and 3 D Tmin grows linearly 
with N whereas Tf shows a more complex behavior. For 
N > 16, Tf is determined from the Monte Carlo simula- 
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tions: a) we vary the T in steps of 0.05 or smaller, b) at 
each T we start from the native state and monitor the 
probability of occupying it, c) in most cases the results 
are averaged over 50 different trajectories. The number 
of Monte Carlo steps for each T depends on N and it 
ranges from 10^ to 7x10^. We checked that doubling the 
selected cutoff times had negligible effect on Tf . The pro- 
cedure yields results which agree with those obtained by 
the exact enumeration for A'^=16. In 2D, the dependence 
oi < Tf > on N initially follows that of Tmin- However, 
on crossing Nc of 36, Tf falls off and it may saturate 
which is suggested by the declining rate of growth. Thus 
2D Go conformations appear to have intrinsic limits to 
their thermodynamic stability. Beyond the foldicity 
becomes gradually poorer and poorer. The same scenario 
appears to be present also in the 3-D case except that the 
small N value of Tf is substantially larger than Tmin- Tf 
starts showing signs of the saturation around A^=80. We 
were unable to explore values of N that were larger than 
100 but a saturation of Tf is expected on general grounds 
due to the existence of the (first order) phase transition 
to the folded phase in the thermodynamic limit. Tmin, 
on the other hand, is expected to grow indefinitely due 
to the growth of kinetic barriers to cross. In 3-D, Tf and 
Tmin appear to cross somewhere around Nc—SOO. 

In conclusion, we have studied the scaling properties 
not only of the fastest sequences, as in rcf. H], but also 
of those with typical folding rates. The exponents in the 
resulting power laws for the folding times depend on D, 
values of the -B^ 's, and on T. In addition to the de- 
terioration of the folding kinetics with -A^, as described 
by the growth of Tmin and of the folding times, a rel- 
ative deterioration of the thermodynamic stability also 
appears to set in. Thus there will be no rapidly folding 
heteropolymers of a large size. It would be interesting to 
determine the scaling properties for more realistic models 
of proteins. 

This work was supported by KBN (Grant No. 2P03B- 
025-13). Fruitful discussions with Jayanth R. Banavar 
and Dorota Cichocka are gratefully acknowledged. 



[6] J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G. 

Wolynes, Proteins 21, 167 (1995). 
[7] S. Miyazawa and R. I. Jernigan, Macromolecules 18, 534 

(1985). 

[8] N. Go and H. Abe, Biopolymers 20, 1013 (1981). 

[9] M. Cieplak, M. Henkel, J. Karbowski, and J. R. Banavar, 

Phys. Rev. Lett. 80, 3654 (1998); M. Cieplak, M. Henkel, 

and J. R. Banavar, J. Cond. Matt, (in press). 
[10] J. D. Bryngelson, and P. G. Wolynes, J. Ghem. Phys. 93, 

6902 (1989); E. Shakhnovich and A. M. Gutin, Europhys. 

Lett. 9, 569 (1989); J. Saven, J. Wang, and P. Wolynes, 

J. Ghem. Phys. 101, 11037 (1994). 
[11] N. D. Socci and J. N. Onuchic, J. Ghem. Phys. 101, 1519 

(1994); see also M. Gieplak and J. R. Banavar, Fold. Des. 

2, 235 (1997). 

[12] H. S. Ghan and K. A. Dill, J. Ghem. Phys. 99, 2116 
(1994); H. S. Ghan and K. A. DiU, J. Ghem. Phys. 100, 
9238 (1994). 

[13] D. Thirumalai, J. Phys. I (France) 5, 1457 (1995). 
[14] E. M. Lifshitz and L. P. Pitaevskii, Physical Kinetics 
(Pergamon, London, 1981). 



[1] K. A. Dill et al, Protein Science 4, 561 (1995). 

[2] A. Sah, E. Shakhnovich, and M. Karplus,J. Mol. Biol. 
235, 1614-1636 (1994) 

[3] A. R. Dinner, A. Sah, and M. Karplus, Proc. Natl. Acad. 
Sci. USA, 93, 8356-8361 (1996). 

[4] See e.g. T. E. Greighton, Proteins: Structures and Molec- 
ular Properties, W. H. Freeman and Gompany, New York, 
1993. 

[5] A. M. Gutin, V. I. Abkevich, and E. I. Shakhnovich, 
Phys. Rev. Lett. 77, 5433 (1996) 



4 



